R Basics

Interactive R

R is very handy for it’s interactive command line interface. Later we will also explore how to make R reusable with scripts, but for now we will focus on typing at the command prompt to get comfortable.

To get started type: ‘R’ at your command line.

What version of R do you have?

Or

version
##                _                           
## platform       x86_64-w64-mingw32          
## arch           x86_64                      
## os             mingw32                     
## system         x86_64, mingw32             
## status                                     
## major          3                           
## minor          5.1                         
## year           2018                        
## month          07                          
## day            02                          
## svn rev        74947                       
## language       R                           
## version.string R version 3.5.1 (2018-07-02)
## nickname       Feather Spray

We can now get started with the R command promp open.

x=2
print(x) ##Print method
## [1] 2
class(x)
## [1] "numeric"
x=seq(1:10) # Create a vector
class(x)
## [1] "integer"
print(x)
##  [1]  1  2  3  4  5  6  7  8  9 10
print(x[1]) # First index of vector
## [1] 1
print(x[1:5])
## [1] 1 2 3 4 5
y = matrix(nrow=5, ncol=5) # create a 5x5 matrix
print(y)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   NA   NA   NA   NA   NA
## [2,]   NA   NA   NA   NA   NA
## [3,]   NA   NA   NA   NA   NA
## [4,]   NA   NA   NA   NA   NA
## [5,]   NA   NA   NA   NA   NA
class(y)
## [1] "matrix"
y[1,1] = 5
print(y)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    5   NA   NA   NA   NA
## [2,]   NA   NA   NA   NA   NA
## [3,]   NA   NA   NA   NA   NA
## [4,]   NA   NA   NA   NA   NA
## [5,]   NA   NA   NA   NA   NA
y[,1]= x[1:5]
print(y)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   NA   NA   NA   NA
## [2,]    2   NA   NA   NA   NA
## [3,]    3   NA   NA   NA   NA
## [4,]    4   NA   NA   NA   NA
## [5,]    5   NA   NA   NA   NA
class(y[,1])
## [1] "numeric"
y = cbind(seq(1:5), 
seq(1:5),
seq(1:5),
seq(1:5),
seq(1:5))

class(y)
## [1] "matrix"

Data input/output

Getting data and setting working directory:

Throughout this semester we will be using small shared data files I am storing on our course development GitHub repository. Go to https://github.com/rsh249/bioinformatics.git and Download the repository. Unpack it somewhere accessible to you (i.e., your Documents or Desktop folders). Then:

setwd('/path/to/repository')

Read table/tab/csv/txt text files:

read.table() read.csv() read.delim()

cars = read.table('./data/mtcars.csv', header=T, sep = ',') # Read a comma separated values file
head(cars)

cars = read.csv('./data/mtcars.csv')
cars = read.csv2('./data/mtcars.csv') ## Interesting behavior here, will be somewhat faster

cars = read.delim('./data/mtcars.csv', sep=',')

Basic plotting

One of R’s biggest advantages is the ability to create high quality graphics in nearly any format or style. Today we will be working with the basic plotting features but later we will take a look at the ggplot library. ggplot is the current leader in graphics for R.

head(cars)
##               model  mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
plot(cars)

OK. That was not so great. Let’s try somethnig more useful for visualizing these data. We can tell plot() which columns we want to create a scatterplot for:

colnames(cars)
##  [1] "model" "mpg"   "cyl"   "disp"  "hp"    "drat"  "wt"    "qsec" 
##  [9] "vs"    "am"    "gear"  "carb"
plot(cars[,'cyl'], cars[,'mpg'])

OR we can create other types of plots by calling other functions. e.g., a histogram of boxplot:

hist(cars[,'mpg'])

boxplot(cars[,'hp'])

Loops

Repeating tasks using loops

for(i in 1:10) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Catch loop output in a vector or list

li = vector()
for(i in 1:10){
li[[i]]=log(i)
}

apply family functions

The Apply functions in R provide efficient repetition that usually out-performs for loops.

print(y) #our matrix from earlier
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    1    1    1    1
## [2,]    2    2    2    2    2
## [3,]    3    3    3    3    3
## [4,]    4    4    4    4    4
## [5,]    5    5    5    5    5
y = as.data.frame(y)
li1 = apply(y, 1, sum) # row-wise
li2 = apply(y, 2, sum) # column-wise

li2 = lapply(y[,1], log) #returns list
li2 = sapply(y[,1], log) #returns vector

#replicate an operation, a wrapper for sapply
rep = replicate(10, log(y[,1]))

Something Fun:

ggmaps

A subset of R packages known as the tidyverse provides loads of useful tools. Here’s how to use some of those to make cool looking maps from Google maps data. This is a great example of the power of R’s community. I would have no idea where to start to make maps like these from scratch. But we do not have to start from nothing because functions like these exist. This is the “cookbook” approach (just follow the instructions) and it can be highly effective.

library(tidyverse)
## -- Attaching packages ----------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts -------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(mapdata)
## Loading required package: maps
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
library(maps)
library(ggmap)
library(magrittr)
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:ggmap':
## 
##     inset
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract

If any of these fail try and install package ‘tidverse’

One of the best parts of these tools is the built in access to Google maps aerial imagery.

loc = cbind(-73.973917, 40.781799)
loc = as.data.frame(loc)
colnames(loc) = c('lon', 'lat')
bkmap <- get_map(location = loc, maptype = "satellite", source = "google", zoom =14)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=40.781799,-73.973917&zoom=14&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
ggmap(bkmap) + 
geom_point(data = loc, 
 color = "red",
 size =4)

bkmap3 <- get_map(location = loc, maptype = "terrain", source = "google", zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=40.781799,-73.973917&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
ggmap(bkmap3) + 
geom_point(data = loc, 
 color = "red",
 size =4)

bkmap4 <- get_map(location = loc, maptype = "toner-lite", source = "google", zoom = 10)
## maptype = "toner-lite" is only available with source = "stamen".
## resetting to source = "stamen"...
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=40.781799,-73.973917&zoom=10&size=640x640&scale=2&maptype=terrain&sensor=false
## Map from URL : http://tile.stamen.com/toner-lite/10/300/383.png
## Map from URL : http://tile.stamen.com/toner-lite/10/301/383.png
## Map from URL : http://tile.stamen.com/toner-lite/10/302/383.png
## Map from URL : http://tile.stamen.com/toner-lite/10/300/384.png
## Map from URL : http://tile.stamen.com/toner-lite/10/301/384.png
## Map from URL : http://tile.stamen.com/toner-lite/10/302/384.png
## Map from URL : http://tile.stamen.com/toner-lite/10/300/385.png
## Map from URL : http://tile.stamen.com/toner-lite/10/301/385.png
## Map from URL : http://tile.stamen.com/toner-lite/10/302/385.png
ggmap(bkmap4) + 
geom_point(data = loc, 
 color = "red",
 size =4)

Homework Assignment

  1. Create a map like one of these with your hometown at the center of it and post it to #maps

  2. Working in groups of 2-4 design a small data collection project that you can carry out in nicer weather. Go outside and observe something in nature that you can take quantitative measurements on. Record ~20 measurements per member. Agree on the type of observation and measurement ahead of time and bring the data to class on Wednesday for more plotting in R. Consider recording a categorical value too (i.e., measure leaf length for 2 types of plants; count flower petal number for 4 types of flowers; count number of students standing in line at DD vs Starbucks).